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Abstract 

It is well known that complete prior ignorance is not compatible with 
learning, at least in a coherent theory of (epistemic) uncertainty. What is less 
widely known, is that there is a state similar to full ignorance, that Walley 
calls near-ignorance, that permits learning to take place. In this paper we 
provide new and substantial evidence that also near-ignorance cannot be re- 
ally regarded as a way out of the problem of starting statistical inference in 
conditions of very weak beliefs. The key to this result is focusing on a setting 
characterized by a variable of interest that is latent. We argue that such a 
setting is by far the most common case in practice, and we show, for the case 
of categorical latent variables (and general manifest variables) that there is 
a sufficient condition that, if satisfied, prevents learning to take place under 
prior near-ignorance. This condition is shown to be easily satisfied in the most 
common statistical problems. 
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1 Introduction 



Epistemic theories of statistics are often concerned with the question of prior igno- 
rance. Prior ignorance means that a subject, who is about to perform a statistical 
analysis, has not any substantial belief about the underlying data-generating process. 
Yet, the subject would like to exploit the available sample to draw some statistical 
inference, i.e., the subject would like to use the data to learn, moving away from 
the initial condition of ignorance. This situation is very important as it is often 
desirable to start a statistical analysis with weak assumptions about the problem of 
interest, thus trying to implement an objective-minded approach to statistics. 

A fundamental question is if prior ignorance is compatible with learning. Walley 
gives a negative answer for the case of his self-consistent (or coherent) theory of 
statistics: he shows, in a very general sense, that vacuous prior beliefs lead to 
vacuous posterior beliefs, irrespective of the type and amount of observed data 



Walley (1991), Section 7.3.7]. But, at the same time, he proposes focusing on a 



slighlty different state of beliefs, called near-ignorance, that does enable learning 



to take place Walley (1991), Section 4.6.9]. Loosely speaking, near-ignorant beliefs 
are beliefs close but not equal to vacuous (see Section [3]). The possibility to learn 
under prior near-ignorance is shown, for instance, in the special case of the near- 
ignorance prior defining the imprecise Dirichlet model (IDM). This is a popular 
model used in the case of inference from categorical data generated by a discrete 
process ( [Walley (1996)} |Bernard (2005)] ). 

In this paper, we also focus on a categorical random variable X, expressing the 
outcomes of a multinomial process, but we assume that such a variable is latent. 
This means that we cannot observe the realizations of X, so we can learn about 
it only by means of another (not necessarily categorical) variable S, related to X 
in some known way. Variable S is assumed to be manifest, in the sense that its 
realizations can be observed (see Section [2]) . 

In such a setting, we introduce a condition in Section HJ related to the likelihood 
of the observed data, that is shown to be sufficient to prevent learning about X 
under prior near- ignorance. The condition is very general as it is developed for any 
prior that models near-ignorance (not only the one used in the IDM), and for very 
general kinds of relation between X and S. We show then, by simple examples, 
that such a condition is easily satisfied, even in the most elementary and common 
statistical problems. 

In order to appreciate this result, it is important to realize that latent variables 
are ubiquitous in problems of uncertainty. It can be argued, indeed, that there is a 
persistent distinction between (latent) facts (e.g., health, state of economy, color of 
a ball) and (manifest) observations of facts: one can regard them as being related 
by a so-called observational process; and the point is that these kinds of processes 
are imperfect in practice. Observational processes are often neglected in statistics, 
when their imperfection is deemed to be tiny. But a striking outcome of the present 
research is that, no matter how tiny the imperfection, provided it exists, learning is 
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not possible under prior near-ignorance. 

In our view, the present results raise serious doubts about the possibility to adopt 
a condition of prior near-ignorance in real, as opposed to idealized, applications of 
statistics. As a consequence, it may make sense to consider re-focusing the research 
about this subject on developing models of very weak states of belief that are, 
however, stronger than near- ignorance. 



2 Categorical Latent Variables 

In this paper, we follow the general definition of latent and manifest variables 
given by Skrondal and Rabe-Hesketh (2004)] : a latent variable is a random vari- 



able whose realizations are unobservable (hidden), while a manifest variable is a 
random variable whose realizations can be directly observed. The concept of la- 
tent variable is central in many sciences, like for example psychology and medicine. 
Skrondal and Rabe-Hesketh (2004)"] list several fields of application and several phe- 



nomena that can be modeled using latent variables, and conclude that latent variable 
modeling "pervades modern mainstream statistics," although "this omni-presence of 
latent variables is commonly not recognized, perhaps because latent variables are 
given different names in different literatures, such as random effects, common fac- 
tors and latent classes" or hidden variables. 

But what are latent variables in practice? According to Boorsbom et al. (2002)] , 



there may be different interpretations of latent variables. A latent variable can be 
regarded, for example, as an unobservable random variable that exists independently 
of the observation. An example is the unobservable health status of a patient that 
is subject to a medical test. Another possibility is to regard a latent variable as 
a product of the human mind, a construct that does not exist independent of the 
observation. For example the unobservable state of the economy, often used in 
economic models. In this paper, we assume the existence of a latent categorical 
random variable X, with outcomes in X = {x\, . . . , x^} and unknown chances 9 G 
6 := {6 = (6i, . . . , 9k) | J2i=i @i = 1) — @i — 1}' without stressing any particular 
interpretation. 

Suppose now that our aim is to predict, after iV realizations of the variable 
X, the next outcome (or the next N' outcomes). Because the variable X is latent 
and therefore unobservable by definition, the only possible way to learn something 
about the probabilities of the next outcome is to observe the realizations of some 
manifest variable S related, in a known way, to the (unobservable) realizations of 
X. An example of known relationship between latent and manifest variables is the 
following. 

Example 1 We consider a binary medical diagnostic test used to assess the health 
status of a patient with respect to a given disease. The accuracy of a diagnostic 
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testis determined by two probabilities: the sensitivity of a test is the probability of 
obtaining a positive result if the patient is diseased; the specificity is the probability 
of obtaining a negative result if the patient is healthy. Medical tests are assumed to 
be imperfect indicators of the unobservable true disease status of the patient. There- 
fore, we assume that the probability of obtaining a positive result when the patient 
is healthy, respectively of obtaining a negative result if the patient is diseased, are 
non-zero. Suppose, to make things simpler, that the sensitivity and the specificity 
of the test are known. In this example, the unobservable health status of the patient 
can be considered as a binary latent variable X with values in the set {Healthy, 111}, 
while the result of the test can be considered as a binary manifest variable S with 
values in the set {Negative result, Positive result}. Because the sensitivity and the 
specificity of the test are known, we know how X and S are related. {> 

We continue discussion about this example later on, in the light of our results, 
in Example [2] of Section HI 



3 Near-Ignorance Priors 

Consider a categorical random variable X with outcomes in X = {xi, . . . , x&} and 
unknown chances 9 G 0. Suppose that we have no relevant prior information about 
9 and we are therefore in a situation of prior ignorance. How should we model our 
prior beliefs in order to reflect the initial lack of knowledge? 

Let us give a brief overview of this topic in the case of coherent models of uncer- 
tainty, such as Bayesian probability and Walley's theory of coherent lower previsions. 

In the traditional Bayesian setting, prior beliefs are modeled using a single 
prior probability distribution. The problem of defining a standard prior probabil- 
ity distribution modeling a situation of prior ignorance, a so-called noninformative 
prior, has been an important research topic in the last two centurie^] and, despite 
the numerous contributions, it remains an open research issue, as illustrated by 
Kass and Wassermann (1996)| . See also Hutter (2006)] for recent developments 



and complementary considerations. There are many principles and properties that 
are desirable to model a situation of prior ignorance and that have been used in 
past research to define noninformative priors. For example Laplace's symmetry or 
indifference principle has suggested, in case of finite possibility spaces, the use of the 
uniform distribution. Other principles, like for example the principle of invariance 
under group transformations, the maximum entropy principle, the conjugate priors 
principle, etc., have suggested the use of other noninformative priors, in particular 
for continuous possibility spaces, satisfying one or more of these principles. But, 
in general, it has proven to be difficult to define a standard noninformative prior 
satisfying, at the same time, all the desirable principles. 



1 For further details about the modeling of diagnostic accuracy with latent variables see 
Yang and Becker (1997)]. 



"Starting from the work of Laplace at the beginning of the 19 th century ( Laplace (1820)| ) 
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In the case of finite possibility spaces, we agree with 
De Cooman and Miranda (2006)] when they say that there are at least two 



principles that should be satisfied to model a situation of prior ignorance: the sym- 
metry principle and the embedding principle. The symmetry principle states that, 
if we are completely ignorant a priori about 8, then we have no reason to favour one 
possible outcome of X to another, and therefore our probability model on 8 should 
be symmetric. This principle recalls Laplace's symmetry or indifference principle 
that, in the past decades, has suggested the use of the uniform prior as standard 
noninformative prior. The embedding principle states that, for each possible event 
A, the probability assigned to A should not depend on the possibility space X in 
which A is embedded. In particular, the probability assigned a priori to the event 
A should be invariant with respect to refinements and coarsenings oi X. It is easy 
to show that the embedding principle is not satisfied by the uniform distribution. 
How should we model our prior ignorance in order to satisfy these two principles? 
Walley (1991)] gives a compelling answer to this question: he proves^ that the 



only probability model consistent with coherence and with the two principles is the 
vacuous probability model, i.e., the model that assigns, for each non-trivial event A, 
lower probability P_(A) = and upper probability P(A) = 1. It is evident that this 
model cannot be expressed using a single probability distribution. It follows that, 
to model properly and in a coherent way a situation of prior ignorance, we need 
imprecise probabilities^ 

Unfortunately, adopting the vacuous probability model for X is not a practical 
solution to our initial problem, because it produces only vacuous posterior probabili- 
ties. Walley (1991)] suggests, as practical solution, the use of near-ignorance priors. 
A near-ignorance prior is a large closed convex set M.q of probability distributions 
for 8, very close to the vacuous probability model, which produces a priori vacuous 
expectations for various functions / on 6, i.e., such that E(/) = inf ee0 /(6') and 
E(/)=sup, ee /((?). 

An example of near-ignorance prior that is particularly instructive is the set 
of priors M.q used in the imprecise Dirichlet model (IDM). The IDM models a 
situation of prior ignorance about the chances 8 of a categorical random variable 
X. The near-ignorance prior Ai Q used in the IDM consists in the set of all Dirichlet 
densities p(9) = dir Stt (6) for a fixed s > and all t G T, where 



and 



T := {t = (t!,...,t fc )| t fc = 1, 0<tj < 1}. 



3 In Note 7, p. 526. See also Section 5.5. 
For a complementary point of view, see [Hutter (2006)] . 
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The particular choice of A4q in the IDM implies vacuous prior expectations for all 
functions f{9) = Of, for all N' > 1 and all i G {1, . . . , k}, i.e., E(0f' ) = and 
E(9i ) = 1. Choosing N' = 1, we have, a priori, 

P(X = Xi ) = E(0;) = 0, P(X = Xi ) = E(0 f ) = 1. 

It follows that the particular near- ignorance prior Ai used in the IDM implies 
vacuous prior probabilities for each possible outcome of the variable X. It can be 
shown that this particular set of priors satisfies both the symmetry and embedding 
principles. 

But what is the difference between the vacuous probability model and the the 
near-ignorance prior used in the IDM? In fact, although both models produce vacu- 
ous prior probabilities and both models satisfy the symmetry and embedding princi- 
ples, the IDM yields posterior probabilities that are not vacuous, while the vacuous 
probability model produces only vacuous posterior probabilities. The answer to this 
question is the reason why we use the term near-ignorance: in the IDM, although 
we are completely ignorant about the possible outcomes of the variable X, we are 
not completely ignorant about the chances 9, because we assume a particular class 
of prior distributions, i.e., the Dirichlet distributions for a fixed value of s. 



4 Limits of Learning under Prior Near-Ignorance 



Consider a sequence of independent and identically distributed (IID) categorical 
latent variables (Xj) ig N with outcomes in X and unknown chances 9 G 0, and a 
sequence of independent manifest variables (Sj)j e N- We assume that a realization of 
the manifest variable S* can be observed only after an (unobservable) realization of 
the latent variable X, and that the probability distribution of Sj given Xj is known 
for each i G N. Furthermore, we assume Si to be independent of the chances 9 of 
Xj given Xj. Define the random variables X := (Xi, . . . , Xjy), S := (Si, . . . , Sjy) and 
X' := (Xjv+i, ■ ■ ■ , Xjv+jv')- n 

We focus on the problem of predictive inferencelj Suppose that we observe a 
dataset s of realizations of manifest variables Si, . . . , Sjv related to the (unobservable) 
dataset x G X N of realizations of the variables Xi,...,Xjv- Using the notation 
defined above we have S = s and X = x. Our aim is to predict the outcomes of 
the next N' variables X^r. 
calculate P(X' = x' I S = 



X, 



1 , . . . , J^N+N' 

s) and P(X' 



In particular, given x' G X , our aim is to 
- x' | S = s) . To simplify notation, when no 



confusion is possible, we denote in the rest of the paper S = s with s and X' 
with x'. The (in) dependence structure can be depicted graphically as follows: 




5 For a general presentation of predictive inference see 
imprecise probability approach to predictive inference see 



Geisser (1993)] ; for a discussion of the 
Walley et al. (1999)] . 
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Modelling our prior ignorance about the parameters 9 with a near-ignorance 
prior .Mo and denoting by n' := (n[, . . . ,n' k ) the frequencies of the dataset x', we 
have 



P(x' | s) 




where, according to Bayes theorem, 



p(9 | s) 



P(s | 9)p{9) 



J e V{s\9)p{9)d9' 



provided that f Q P(s | 9)p(9)d9 ^ 0. Analogously, substituting sup to inf in ([3]), we 
obtain 



The central problem now is to choose A4 so as to be as ignorant as possible a priori 
and, at the same time, to be able to learn something from the observed dataset of 
manifest variables s. Theorem [T] and the following corollaries yield a first partial 
solution to the above problem, stating several conditions for learning under prior 
near-ignorance. 

Theorem 1 Let s be given. Consider a bounded continuous function f defined on 
and denote with / max the Supremum of f on 0. If the likelihood function P(s | 9) 
is strictly positiv^ in each point in which f reaches its maximum value / max and it 
is continuous in an arbitrary small neighborhood of these points, and A4o is such 
that a priori E(/) = f max , then 



Many corollaries to Theorem [T] are listed in Section [B] of the Appendix. Here 
we discuss only the most important corollary. Consider, given a dataset x', the 
particular function f{9) = Yli=i@i' i - This function is particularly important for 

6 The Assumption about P(s|0) in Theorem Q] can be substituted by the following weaker 
assumption. For a given arbitrary small 5 > 0, denote with Qs the measurable set, Q$ := {9 € 
© I f(0) > /max - 5}. If P(s | 0) is such that, lim^o infe e e 5 P(s | 0) = c> 0, then Theorem [T] holds. 




(3) 



E(/ | s) = E(/) = /, 



max' 
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predictive inference, because its lower and upper expectations correspond to the 
lower and upper probabilities assigned to the dataset x'. It is easy to show that, 
in this case, the minimum of / is and is reached in all the points 9 G with 
9i = for some i such that n! i > 0, while the maximum of / is reached in a single 
point of 6 corresponding to the relative frequencies f of the sample x', i.e., at 

f = (jjr, . . . , jfij G 0, and the maximum of / is given by Yli=i (lv) ■ ^ f°U° ws 
that vacuous probabilities regarding the dataset x' are given by 

p(xO = e(hCM =o, 

Corollary 1 Let s be given and let P(s | 9) be a continuous strictly positive function 
on 0. Then, if Ai implies vacuous prior probabilities for a dataset x' G X , the 
predictive probabilities of x' are vacuous also a posteriori, after having observed s, 
i.e., 

P(x' | s) = P(x') = 0, 

k / f \ n' 

p ( x'is)=p(x')=n f ■ 

i=l v 7 

In other words, Corollary [1] states a sufficient condition that prevents learning 
to take place under prior near-ignorance: if the likelihood function P(s | 9) is con- 
tinuous and strictly positive on 0, then all the dataset x' G X N for which Ai 
implies vacuous probabilities have vacuous probabilities also a posteriori, after hav- 
ing observed s. It follows that, if this sufficient condition is satisfied, we cannot use 
near-ignorance priors to model a state of prior ignorance for the same reason for 
which, in Section [3], we have excluded the vacuous probability model: because only 
vacuous posterior probabilities are produced. 

The sufficient condition described above is satisfied very often in practice, as 
illustrated by the following striking examples. 

Example 2 Consider the medical test introduced in Example [1] and an (ideally) 
infinite population of individuals. Denote with the binary variable Xj G {H, 1} the 
health status of the i-th individual of the population and with Sj G {+, — } the 
results of the diagnostic test applied to the same individual. We assume that the 
variables in the sequence (Xj) ie N are IID with unknown chances (9, 1 — 9), where 9 
corresponds to the (unknown) proportion of diseased individuals in the population. 
Denote with 1 — E\ the sensitivity and with 1 — 82 the specificity of the test. Then 
it holds that 

P(S l = + \X l = H)=e 1 >0, 
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P(S< = -\X l = I)=e 2 >0, 



where (I, if, +, — ) denote (patient ill, patient healthy, test positive, test negative). 

Suppose that we observe the results of the test applied to N different individuals 
of the population; using our previous notation we have S = s. For each individual 
we have, 

P(S, = + 1 6) = 
=P(S i = + |X J = /)P(X J = /|0)+ 
+P(S i = + \X i = H)P(X l = H\9) = 

= (l-e 2 ) -9+ ex -(1-0) >0. 

>o >o 



Analogously, 

P(S i = -|0) = 
=P(S i = -|X i = /)P(X i = /|0)+ 
+P(S l = -|X i = #)P(X i = #|0) 
= e 2 .$ + (l - El ) -(1 - 9) >0. 

>o >o 



Denote with n s the number of positive tests in the observed sample s. Then, because 
the variables are independent, we have 

P(S = s | 6) = ((1 - e 2 ) ■ 9 + e x ■ (1 - 0))"'. 

■{e 2 -e + {i-ex)-{i-e)) N - ne >0 



for each G [0, 1] and each s G A ,JV . Therefore, according to Corollary HJ all 
the predictive probabilities that, according to A4o, are vacuous a priori remain 
vacuous a posteriori. It follows that, if we want to avoid vacuous posterior predictive 
probabilities, then we cannot model our prior knowledge (ignorance) using a near- 
ignorance prior implying some vacuous prior predictive probabilities. This simple 
example shows that our previous theoretical results raise serious questions about the 
use of near-ignorance priors also in very simple, common, and important situations. 

The situation presented in this example can be extended, in a straightfor- 
ward way, to the general categorical case and has been studied, in the spe- 
cial case of the near-ignorance prior used in the imprecise Dirichlet model, in 
Piatti et al. (2005)1 - 

Example [2] focuses on discrete latent and manifest variables. In the next example, 
we show that our theoretical results have important implications also in models with 
discrete latent variables and continuous manifest variables. 
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Example 3 Consider the sequence of IID categorical variables (Xj)j G N with out- 
comes in X and unknown chances 9 G O. Suppose that, for each % > 1, after a 
realization of the latent variable Xj, we can observe a realization of a continuous 
manifest variable Sj. Assume that p(Sj | Xj continuous positive probability 

density, e.g., a normal N(fij,alj) density, for each Xj G A\ We have 

p(S, | = ^ p(Sj I X, = x,) • P(X, = | 0) = 

= V p(S i |X 4 = x i )-0 i >O, 
^e*" >o 

because 9j is positive for at least one j G {1, . . . , iV} and we have assumed Sj to be 
independent of 6 given Xj. Because we have assumed (Sj)j £ N to be a sequence of 
independent variables, we have, 



N 




Therefore, according to Corollary [TJ if we model our prior knowledge using a near- 
ignorance prior M , the vacuous prior predictive probabilities implied by J^4.q remain 
vacuous a posteriori. It follows that, if we want to avoid vacuous posterior predictive 
probabilities, we cannot model our prior knowledge using a near-ignorance prior 
implying some vacuous prior predictive probabilities. (} 

Examples [2] and [3] raise, in general, serious criticisms about the use of near- 
ignorance priors in practical applications. 

The only predictive model in the literature, of which we are aware, where a 
near-ignorance prior is used successfully to obtain non- vacuous posterior predictive 
probabilities is the IDM. In the next example, we explain how the IDM avoids our 
theoretical limitations. 

Example 4 In the IDM, we assume that the IID categorical variables (Xj) iG N are 
observable. In other words, we have Sj = Xj for each i > 1 and therefore the IDM 
is not a latent variable model. Having observed S = X = x, we have 

k 

p(s=x|0) =p(x=x|0) =n^% 

i=l 

where rij denotes the number of times that Xj G X has been observed in x. We have 
P(X = x | 9) = for all 9 such that 9j = for at least one j such that rij > and 
P(X = x | 9) > for all the other 9 G 6, in particular for all 9 in the interior of 6. 

The near-ignorance prior Aio used in the IDM consists in the set of all the 
Dirichlet densities dir s ^{9) for a fixed s > and all t G T, where dir S)t {9) and T 
have been defined in flTJ and fl2]). 
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The particular choice of .Mo hi the IDM implies, for each N' > 1 and each 
i G {!,..., k}, that 



E(6f) = 0, E(9?') = 1. 



Consequently, denoting with d 1 G A ,JV ' the dataset with = N' and ra'- = for each 
j 7^ i, a priori we have, 



and in particular 



P(X' = d 1 ) = 0, P(X' = d 1 ) = 1, 



P(Xi = Xi ) = 0, P(X X = Xl ) = 1. 



It can be shown that other prior predictive probabilities are not vacuous. For ex- 
ample, for i ^ j, we have 

4^s + ij 4 e 6 e 

The IDM produces, for each possible observed data set x, non- vacuous posterior 
predictive probabilities for each possible future data set (see Walley (1996)| ). This 



means that our previous theoretical limitations are avoided in some way. To explain 
this result we consider two cases. We consider firstly an observed data set x where 
we have observed at least two different outcomes. Secondly, we consider a data set 
x formed exclusively by outcomes of the same type, in other words, a data set of 
the type d 1 . 

In the first case we have that P(x | 9) = Ylj=x @j 1 ls ec i ua l to zero for 9 = e 1 
for each i G {1, . . . , k}. In fact, = 1 implies 9j = for each j ^ % and there 
is at least one j with rij > 0. Therefore, the assumptions of Corollaries H] and 
are not satisfied. And in fact the IDM produces non-vacuous posterior predictive 
probabilities for each data set that, a priori, has vacuous predictive probabilities. 
On the other hand, all the datasets whose prior predictive probability reaches its 
maximum in a relative frequency f G 9 such that P(x | f) > 0, are characterized by 
non- vacuous prior predictive probabilities. 

The second case yields similar results. The only difference is that P(d' | 9) = 9f 
for a given i G {1, . . . , k}. In this case P(x | e 1 ) = 1 > and in fact, according to 
Corollaries 0] and [5j we obtain 

P(xi | x) = P{ Xi ) = 1, 

P(X' = d ! |x) = P(d i ) = 1, 
and consequently, for each j ^ i and each y ^ d 1 , 

P( Xj | x) = P( Xj ) = 0, 

P(X' = y|x)=P(y) = 0. 
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But, on the other hand, we obtain 



P(xi | x) > 0, P(X' = d 1 1 x) > 0, 
P(x j |x) < 1, P(X' = y|x) < 1, 



and therefore the posterior predictive probabilities are not vacuous for each possible 



Yet, since the variables (Xj) iG N are assumed to be observable, the successful 
application of a near-ignorance prior in the IDM is not helpful in addressing the 
doubts raised by our theoretical results about the applicability of near-ignorance 
priors in situations where the variables (Xj)j eN are latent. 

5 Conclusions 

In this paper we have proved a sufficient condition that prevents learning about a 
latent categorical variable to take place under prior near-ignorance about the data- 
generating process. 

The condition holds as soon as the likelihood is strictly positive (and continuous), 
and so is satisfied frequently, even in the simplest settings. Taking into account that 
the considered framework is very general and pervasive of statistical practice, we 
regard this result as a form of substantial evidence against the possibility to use prior 
near-ignorance in real statistical problems. Given that complete prior ignorance is 
not compatible with learning, as it is well known, we deduce that there is little hope 
to use any form of prior ignorance to do objective-minded statistical inference in 
practice. 

As a consequence, we suggest that future research efforts should be directed to 
study and develop new forms of knowledge that are close to near-ignorance but that 
do not coincide with it. 
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A Technical preliminaries 

In this appendix we provide some technical results that are used to prove the theo- 
rems in the paper. First of all, we introduce some notation used in this appendix. 
Consider a sequence of probability densities (p n )neN and a function / defined on a 
set 6. Then, we use the notation, 



future data set. 
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Pn(e):= [ Pn(0)de, ece. 

In addition, for a given probability density pon 0, 



E P (/) := / f(9)p(0)d6, 



e 



P p (9) := I p{9)d9, ece. 
Finally, with — > we denote liin n _K 3Cl . 

Theorem 2 Let C R fc be the closed k- dimensional simplex and let (p n ) n eN be 
a sequence of probability densities defined on w.r.t. the Lebesgue measure. Let 
f > be a bounded continuous function on and denote with / max the supremum 
of f on 0. For this function define the measurable sets 

e s = {9 G | f(9) > / max - 5}. (4) 

Assume that (p n )neN concentrates on a maximum of f for n — > oo, in the sense that 

E n (/) ^ ; max , (5) 

then, for all 5 > 0, it holds 

Theorem 3 Let L(9) > be a bounded measurable function with 

lim inf L(9) =: c> 0, (6) 

eee s 



under the same assumptions of Theorem^ Then 

E n (Lf) _ J e f(9)L(9)p n (9)d9 
E n (L) J e L(9)p n (9)d9 



/max- (7) 



Remark 1 If f has a unique maximum in 9 = 9 and L is a function, continuous in 
an arbitrary small neighborhood of 9 = 9 , such that L(9 ) > 0, then is satisfied. 



B Corollaries to Theorem U 

The following Corollaries to Theorem [1] are necessary to prove Corollary dj and are 
useful to understand more deeply the limiting results implied by the use of near- 
ignorance priors with latent variables. 
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Corollary 2 Let x' and s be given. Denote with f := (jp, ■ ■ ■ , j^j G £/ie vector 

of relative frequencies of the datasetid . IfP(s \ 9) is continuous in an arbitrary small 
neighborhood of 9 = f, P(s | f) > and A4 is such that 



then 

P(x'|s) = P(x'). 

Corollary 3 Lei x' and s 6e given. If~P(s\9) > /or eac/i # G itrai/i = for at 
least one i with n\ > 0, and A4 is such that P(x') = ; it follows that 

P(x' | s) = P(x') = 0. 

Corollary 4 Let s be given. Consider an arbitrary Xi G X and denote with e 1 the 
particular vector of chances with 9^ = 1 and 9j = for each j ^ i. Suppose that M.q 
is such that, a priori, P(X X = x.j) := E(^) = 1. Then, ifP(s | e 1 ) > and P(s | 9) is 
continuous in a neighborhood of 9 = e 1 , we have 

P(X N+1 = Xi | s) = P(Xi = Xi ) = 1, (8) 

and consequently, 

P(X N+1 = Xj | s) = P(X j = Xi ) = 0, (9) 

for each j ^ i. 

Corollary 5 Let s and N' be given and consider an arbitrary x^ G X. Suppose that 
M.q is such that, a priori, P(Xi = xi) := E(9i) = 1. Denote with d 1 G X N the data 
set with rii = N' and rij = for each j ^ i. Then, if P(s | e 1 ) > and P(s | 9) is 
continuous in a neighborhood of 9 = e 1 , we have 

P(X' = d i |s) = 1, 

and consequently, 

P(X' = y|s) = 0, 

for each y ^ d 1 . 
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